Skip to content

do_api_query: Reset session on ConnectionError to fix stale connections#913

Open
shawntru04 wants to merge 4 commits intozulip:mainfrom
shawntru04:fix/stale-connection-reset-761
Open

do_api_query: Reset session on ConnectionError to fix stale connections#913
shawntru04 wants to merge 4 commits intozulip:mainfrom
shawntru04:fix/stale-connection-reset-761

Conversation

@shawntru04
Copy link
Copy Markdown
Collaborator

@shawntru04 shawntru04 commented Apr 6, 2026

When a network middlebox silently drops an idle TCP connection, the next request raises a ConnectionError. Previously the retry logic would reuse the same dead session, failing up to 10 times. Now we close and null out the session on ConnectionError so ensure_session() creates a fresh one on the next retry.

This differs from #854 in that it reuses the existing error_retry machinery rather than adding a parallel retry system with separate timeout constants and retry counters.

Fixes: #761

How did you test this PR?

Added a new automated test in zulip/tests/test_do_api_query.py that simulates a stale connection by having the first session raise a ConnectionError, then verifies that the session is closed and replaced with a fresh one, and that the request succeeds on retry.

Self-review checklist
  • Self-reviewed the changes for clarity and maintainability
    (variable names, code reuse, readability, etc.).

Communicate decisions, questions, and potential concerns.

  • Explains differences from previous plans (e.g., issue description).
  • Highlights technical choices and bugs encountered.
  • Calls out remaining decisions and concerns.
  • Automated tests verify logic where appropriate.

Individual commits are ready for review (see commit discipline).

  • Each commit is a coherent idea.
  • Commit message(s) explain reasoning and motivation for changes.

Completed manual review and testing of the following:

  • Visual appearance of the changes.
  • Responsiveness and internationalization.
  • Strings and tooltips.
  • End-to-end functionality of buttons, interactions and flows.
  • Corner cases, error conditions, and easily imagined bugs.

When a network middlebox silently drops an idle TCP connection, the
next request raises a ConnectionError. Previously the retry logic would
reuse the same dead session, failing up to 10 times. Now we close and
null out the session on ConnectionError so ensure_session() creates a
fresh one on the next retry.

Fixes zulip#761.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve error handling when a request fails due to the persistent connection being expired

1 participant